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^ \ Abstract 

> 

Distributed Arithmetic Coding (DAC) proves to be an effective implementation of Slepian-Wolf 
Coding (SWC), especially for short data blocks. To study the property of DAC codewords, the author 
has proposed the concept of DAC codeword spectrurrQ. For equiprobable binary sources, the problem 
was formatted as solving a system of functional equations. Then, to calculate DAC codeword spectrum 
in general cases, three approximation methods have been proposed. In this paper, the author makes use 
of DAC codeword spectrum as a tool to answer an important question: how many (including proper and 
wrong) paths will be created during the DAC decoding, if no path is pruned? The author introduces the 
concept of another kind of DAC codeword spectrum, i.e. time spectrum, while the originally-proposed 

o 



DAC codeword spectrum is called path spectrum from now on. To measure how fast the number of 
decoding paths increases, the author introduces the concept of expansion factor which is defined as 
the ratio of path numbers between two consecutive decoding stages. The author reveals the relation 
between expansion factor and path/time spectrum, and proves that the number of decoding paths of any 
DAC codeword increases exponentially as the decoding proceeds. Specifically, when symbols '0' and 
'1' are mapped onto intervals [0, q) and [1 — q, 1), where 0.5 < q < 1, the author proves that expansion 
factor converges to 2q as the decoding proceeds. 

Index Terms 

Distributed Source Coding (DSC), Slepian-Wolf Coding (SWC), Distributed Arithmetic Coding 
(DAC), Codeword Spectrum. 
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'in the previous papers on this topic, the author uses the terminology "codeword distribution." To avoid such an ugly statement 
"distribution of distributed ..." the author will use "codeword spectrum" to replace "codeword distribution" from now on. 
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I. Introduction 

Arithmetic Coding (AC) [1J and its fast implementation Quasi AC (QAC) [2 J have widely 
been used for data compression due to its entropy-approaching performance. To deal with noisy 
transmission, the AC can be extended in two ways to implement Joint Source-Channel Coding 
(JSCC): one is to introduce forbidden intervals corresponding to forbidden symbols 0), S|, e.g. 
Error-Correcting AC (ECAC), which has been used for image and video transmission flU, J6j, 
0, jH; the other is to insert markers into the sequence of source symbols at fixed positions 
AH. Recently, to deal with Slepian-Wolf Coding (SWC) QH, the AC has also been extended in 
two ways: one is to introduce overlapped intervals corresponding to ambiguous symbols, e.g. 
Distributed AC (DAC) 0J], EH and Overlapped QAC (OQAC) 031; the other is to puncture 
some bits of AC bitstream, e.g. Punctured QAC (PQAC) [14] . There are also some variants of the 
DAC, e.g. Time-Shared DAC (TS-DAC) O for symmetric SWC, rate-compatible DAC lfT6ll . 
decoder-driven adaptive DAC [fT71 for online estimation of source statistics, etc. Most recently, 
DAC implementation of Distributed Joint Source-Channel Coding (DJSCC) has also appeared 

urn. 

Let x be the source and y decoder Side Information (SI). It is straightforward to know that 
the performance of the AC is possible to approach to source entropy H(x). However, to the best 
of the author's knowledge, no analysis on the performance of the ECAC and the DAC is found 
in the literature up to now. For the ECAC, we have no idea whether the rate can approach to 
the limit H(x)/C, where C is channel capacity. For the DAC, nobody knows whether the rate 
can approach to the limit H(x\y) ifTOl . For the DAC-based DJSCC, it remains an open question 
whether the rate can approach to the limit H(x\y)/C. 

This paper is devoted to the performance analysis on the DAC. In the author's opinion, 
to answer the question whether the rate of the DAC can approach to the limit H(x\y), the 
prerequisites include two folds. First, one needs to know how many paths will be created as the 
DAC decoding proceeds. Second, one should know the Probability Density Function (PDF) of 
the Hamming distances between those decoding paths and the source. 

In |fl9ll , the author introduces the concept of codeword spectrum which is a function defined 
over interval [0, 1). For DAC codeword spectrum of equiprobable binary sources along proper 
decoding paths, the problem is formatted as solving a system of functional equations including 
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four constraints 035]. Then, three approximation methods are proposed in ||20l for calculating 
DAC codeword spectrum, i.e. numeric approximation, polynomial approximation, and Gaussian 
approximation. Though the concept of DAC codeword spectrum seems wonderful, it finds no 
usage in practice up to now. 

In this paper, by using DAC codeword spectrum as a tool, the author answers an important 
question: how many (proper and wrong) paths will be created as the DAC decoding proceeds? 
This is the first application of DAC codeword spectrum up to now. Through this work, the author 
expects to find more applications of DAC codeword spectrum in the future. 

This paper is arranged as follows. Section [U briefly introduces the principle of DAC codec. 
Section [Till introduces the concepts that will be used in the following analyses, e.g. path 
spectrum, time spectrum, population, expansion factor, etc., and reveals the relations between 
expansion factor and path/time spectrum. Section [IV] researches the evolution and numeric 
calculation of time spectrum. Section [V] reports experimental and theoretical results of expansion 
factor. Finally, Section [VI] concludes this paper. 

II. Principle of Distributed Arithmetic Coding 

A. Encoding 

Consider an infinite-length, stationary, and equiprobable binary source x = {xi}^. Let y = 
{l/ijiZi be decoder SI, where Pr(xj ^ y^) = p. The DAC encoder |fTT|. 021] iteratively maps 
source symbols '0' and '1' onto intervals [0, q) and [(1 — q),l), where q = 2~ a . We call a 
overlapping factor, which satisfies 

?^M = H(x\y)<a<l. (1) 
H(x) 

The resulting codeword of x is denoted by C a (x). The DAC encoding process is in fact a 
transform that converts source x into codeword C a (x). We denote the rate of C a (x) by R a (x). 
It is easy to obtain 

R a (x) = aH(x) = a> H{x\y). (2) 

B. Decoding 

The DAC decoder works in a symbol-driven mode. Because (1 — q) < q when q G (0.5, 1), 
intervals [0, q) and [(1 —q), 1) are partially overlapped. Though this overlapping leads to a larger 
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final interval and hence a shorter codeword, it also causes an ambiguity during the decoding 
as a cost. To describe the DAC decoding process, a ternary symbol set {0,.A, 1} is defined, 
where A represents the ambiguous symbol. Once symbol A is met, the decoder will perform 
a branching: two candidate paths are created, corresponding to two alternative symbols '0' and 
'1'. Therefore, as the decoding proceeds, more and more paths will be created. Undoubtedly, 
among them, there is only one proper path corresponding to source x. We denote the j-th path 
as xj = {xji}^, where the vi-th symbol along path Xj. 

For path Xj, when decoding symbol Xji, the state of a £>-bit DAC decoder is described by 
parameter set (Z ji; hji, Cji), where Iji, hji, and Cji are S-bit integers. Iji and hji are the lower and 
upper bounds of the range at time i. Cji is the 5-bit codeword in the buffer at time i. Obviously, 

< la < c 3i < h j{ < (2 B - 1). (3) 



Let 

c - - I 



Uji = —21 (4) 

hji Iji ~\~ 1 



Then 

fO, < < (1 - q) 



Xji 



A, {l-q)< Uji < q . (5) 



v l, q < Uji < 1 

If Xij = A, then two candidate paths are created, corresponding to symbols '0' and '1', 
respectively. For each path, its metric is updated according to SI y and its corresponding sub- 
interval is selected for next iteration. To maintain linear complexity, each time a symbol is 
decoded, the decoder makes use of the M-algorithm to retain at most M paths with the best 
partial metric, and prune others [fTTI . lfT2l . Finally, after all source symbols are decoded, the 
path with the best overall metric is output as the estimate of x. 

III. Preliminaries 

A. Definitions 

With the help of Fig. Q3 we give the definitions of path spectrum, population, time spectrum, 
and expansion factor in turn as follows. 
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Fig. 1. Illustration of the concepts of path, path spectrum, population, time spectrum, and expansion factor. In this example, 
there are four decoding paths: xi = "1110", X2 = "1100", X3 = "1001", and X4 = "1000", each of which corresponds 
to its path spectrum, e.g., the path spectrum along path X4 is the PDF of 1x4, — (wii, U12, U23, 1x34, W45). The population 
increases as the decoding proceeds, e.g., there are three paths after time i = 3, so J3 = 3 (initially, Jo = 1). Each decoding 
time corresponds to a time spectrum, e.g., the time spectrum at time i = 3 is the PDF of 11*3 = {1*13, U23}. According to the 
definition of expansion factor, we have 71 = J1/J0 = 1, 72 = 2, 73 = 3/2, etc. 



Definition Path Spectrum: When decoding C a (x) along path Xj, the PDF of Uj* = {uj i \^ l 
is called the path spectrum of C a (x) along path Xj. 

For example, in Fig. [H there are four decoding paths, each of which corresponds to its path 
spectrum. Specially, we are interested in the path spectrum along the proper decoding path x, 
which is denoted by f(u), where u E [0, 1). According to |[T9ll , f(u) should satisfy the following 
constraints 

f(u)du = 1 

o 

/(«) = /(l - u) 

f(u) = f(u/q)/(2q), < u < (1 - q) 

/(«)= V ^ 9 , (l-?)<«<9 

t/ u-(l-q) \ 

f(u) = JK q \ q<u<l 
2q 

As for the calculation of f(u), three approximation methods have been proposed in ||20l . 



(6) 
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Definition Population: The number of paths after decoding the i-th symbol is called the 
population at time i, which is denoted by Jj. 

As there is only one path before decoding the first symbol, we have J Q = 1. As for the 
example of population, please refer to Fig. [TJ 



Definition Time Spectrum: When decoding the i-th symbol, there are Jj_i paths. We call the 



PDF of u*i = {v,ji} J j=l as the time spectrum of C a (x) at time i, which is denoted by gi(u) 



Please refer to Fig. [TJ for the example of time spectrum. As gi(u) is the PDF of u, the 
normalization property should hold, i.e. 

1 

gi(u)du = 1. (7) 



o 

In addition, as we are investigating equiprobable binary sources, the symmetry property should 
also hold, i.e. 

9i(u) = 9i{l - u). (8) 

When decoding the first symbol, there is only one path, which is undoubtedly the proper path. 
Hence, from the statistical view, the time spectrum at time i = 1 is equivalent to the path 
spectrum along the proper decoding path x, i.e. 

9i(u) = f(u). (9) 

Definition Expansion Factor: We define the expansion factor at time i as the ratio of the 

E( J) 

expectation of Jj to that of Jj_i, which is denoted by 7j, i.e. 7j = E ^y 
Please refer to Fig. [Qfor the example of expansion factor. 



B. Relations between Population, Expansion Factor, and Time Spectrum 

When uji falls into [(1 — q), q), two branches will be created, or in other word, one more path 
will be created. Therefore, if there are Jj_i paths at time (i — 1), then from the statistical view, 
Ji-i(fi_ 9i(u)du) more paths will be created at time i on average, i.e. 

E(Ji) = £?(J i _ 1 )(l + I" 9i (u)du). (10) 
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Therefore, the expansion factor at time i is 



7 < = .J^L = 1 + f 9i (u)du. (11) 



E(Ji-i) ji- q 
Especially, as g\(u) = f(u) and J = 1, we have 

7i = E{J X ) = 1 + f f{u)du. (12) 

Jl-q 

Then recursively, we have 

i 

#(Ji)=n^'- 

i'=i 

From the above analyses, we can see that time spectrum gi(u) is the key to answering all 
questions. Once we know gi(u), expansion factor ji can be obtained and then population E(Ji) 
can be deduced in turn. 

IV. Time Spectrum 

A. Evolution 

With the help of Fig.[2l we illustrate how the time spectrum evolutes as the decoding proceeds. 
Let gi(u) be the time spectrum at time i. If < u < q, then the 0-branch will be created and 
interval [0,q) at time i will be mapped onto interval [0, 1) at time (i + 1). It means that the 
part of gi(u) over interval < u < q will be mapped onto gi(qu) over interval < u < 1 at 
the next iteration [Fig. [2]j. Similarly, if (1 — q) < u < 1, then the 1 -branch will be created and 
interval [(1 — q), 1) at time i will be mapped onto interval [0, 1) at time (i + 1). Meanwhile, the 
part of gi(u) over interval (1 — g) < u < 1 will be mapped onto gi(qu + (1 — g)) over interval 
< u < 1 at the next iteration [Fig. 0. Finally, the time spectrum at time (i + 1) should be the 
sum of gi(qu) and g { (qu + (1 - g)) [Fig. 0, i.e. 

ft+iH = Pii.9ii.qu) + £f;(gw + (1 - g))), (14) 

where is introduced to make sure J g i+1 (u)du = 1. It is easy to obtain 

A ^ -, , r q 9 i \j = g/Ti- (15) 
As i approaches to the infinite, we have 

g<x>(u) = Poo(goo(qu) + goo(qu + (1 - q))), Vug [0,1). (16) 
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Fig. 2. Illustration of the evolution of time spectrum. In this example, q = l/v2 and i = 1, so gi(u) = gi(u) = /(«), where 
the closed form of /(it) has been obtained in 1191 . When < u < q, the 0-branch will be created and then interval [0, g) will 
be mapped onto interval [0, 1) at the next iteration (as shown by gi(qu)). Similarly, when (1 — q) < u < 1, the 1-branch will 
be created and then interval [(1 — q), 1) will be mapped onto interval [0, 1) at the next iteration (as shown by gi(qu+ (1 — q))). 
Therefore, gi+i(u) should be the normalized sum of gi(qu) and gi(qu + (1 — q)). 



Hence, 

0oo(«) = l, VnG [0,1). (17) 

It means: as the decoding proceeds, the time spectrum will converge to the uniform distribution. 
Meanwhile, we can also obtain (3^ = 1/2. Finally 



1+ f g o(u)du = 2q = 2 1 - a . 

Jl-q 



7oo = 1 + / goo{u)du = 2q = 2 l ~ a . (18) 



B. Discussion 



Intuitively, E(Ji) reflects the residual uncertainty of x given its DAC codeword C a (x). 
Therefore, the conditional entropy of x given C a (x) can be calculated by 

H(x\C a (x)) = lim bg2 E{Ji) . (19) 
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According to (fT3~T ), we have 

i i 

log 2 E(Ji) = log 2 Yl IV = Yl lo S2 7i'- (20) 
i'=i i'=i 

Thus, 

^(aj|C a (aj)) = Urn = 1 - a. (21) 

It is obvious that 

H(x\C a (x)) = H(x) - I(x; C a (x)), (22) 

i.e. 

1 - a = 1 - I{x;C a (x)). (23) 

Thus, we obtain 

I(x;C a (x)) =a. (24) 

Since C a (x) is the codeword of x, the mutual information between C a (x) and a; is just the 
partial information of x provided C a (x). Recall that the rate of C a (x) is R a (x) = a, so 

R a (x) = I(x;C a (x)). (25) 

It means that the rate of a DAC codeword can reach the mutual information between it and the 
coded source, or in other word, any rate-a DAC codeword conveys a bits information of the 
coded source on average. 

C. Numeric Approximation 

As path spectrum f(u), to find the closed form of time spectrum gi(u) is not an easy thing. 
Thus, inspired by the work in EDI , the author proposes a numeric method for calculating gi(u). 
This method is described in detail below. 

1) Discretization: We divide the interval [0,1] into N uniform cells. Let A = 1/N. Then 
gi(u) can be approximated by (jj(nA), where n G X N = {0, 1, N}, for a large N. 

2) Initialization: Before iteration, we set gi(nA) = /(nA), Wn 6 Z]y, where f(nA) can be 
obtained by the method given in [|20l . 

3) Update: Recursively, g i+1 (nA) can be obtained from ^(nA) by (we omit coefficient fii) 

g i+ i(nA) = gi{round{nq) A) + gi(round(nq + N(l — q))A). (26) 
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Fig. 3. Theoretical and experimental results of expansion factor for q — 0.6, 0.7, and 0.8. A 31-bit DAC codec is used for the 
experiments and the results are averaged over 10 4 DAC codewords of various length- 1024 equiprobable binary sequences. For 
the theoretical results, the number of cells is N = 10 5 . The software of theoretical results is available on the author's homepage. 



4) Normalization: As L gi + \{u)du = 1, we have Yln=o <7i+i(nA)A = 1, i.e. 

N 

^<7 i+1 (nA) = l/A = AT (27) 



n=0 



Let J2n=o 9i+i( n ^) = then #1+1 (^A) should be normalized as 

N 

g i+1 (nA) = -g i+1 (nA). (28) 

5) Expansion Factor: Let L = round(N(l — q)) and H = round(Nq), then the expansion 
factor at time (i + 1) can be calculated by 

7(+i = l + EI^kM) (29) 

V. Simulation Results 

Fig. [3] includes some theoretical and experimental results of expansion factor. For theoretical 
results, the author first calculates the path spectrum along the proper decoding path f(u) through 
the numeric method given in [|20l . where the number of cells is set to iV = 10 5 . Then seeded 
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with f(u), the numeric method given in Section HV-CI is run to obtain gi{u), where the number 
of cells is also set to N = 10 5 . Finally, the expansion factor at time i is obtained by (|29l) . 

For experimental results, a 31-bit DAC codec is used to encode 10 4 various length-1024 
equiprobable binary sequences. Then these codewords are decoded. The decoder first counts the 
number of length-z paths (i.e. only i symbols are decoded for each path), Jj, through full search. 
Then the expansion factor at time i can be obtained by 7« = ^j Ji \ , where E(Ji) means the 
average of Jj over 10 4 DAC codewords. 

From Fig. [3l the reader can find that the theoretical results coincide with the experimental 
results perfectly. Both theoretical and experimental curves converge to 2q rapidly, meaning that 
the above analyses are well verified. 

VI. Conclusion 

This paper researches an important problem: how many paths will be created as the DAC de- 
coding proceeds? To answer this question, the author inctroduces the concepts of path spectrum, 
time spectrum, and expansion factor. The relations between time spectrum, path spectrum, and 
expansion factor are revealed. A numeric method to calculate time spectrum is proposed. The 
given experimental and theoretical results coincide with each other perfectly. In the future, the 
author will continue the work and research another important problem: how about the PDF of 
the Hamming distances between decoding paths and the source? 
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